Search CORE

22 research outputs found

Statistical Analysis of Fractal Image Coding and Fixed Size Partitioning Scheme

Author: Debnath Bhattacharyya
Samir Bandyopadhyay
Swalpa Kumar Roy
Tai-Hoon Kim
Publication venue: Global Journals Inc. (US)
Publication date: 04/06/2015
Field of study

Fractal Image Compression (FIC) is a state of the art technique used for high compression ratio. But it lacks behind in its encoding time requirements. In this method an image is divided into non-overlapping range blocks and overlapping domain blocks. The total number of domain blocks is larger than the range blocks. Similarly the sizes of the domain blocks are twice larger than the range blocks. Together all domain blocks creates a domain pool. A range block is compared with all possible domains block for similarity measure. So the domain is decimated for a proper domainrange comparison. In this paper a novel domain pool decimation and reduction technique has been developed which uses the median as a measure of the central tendency instead of the mean (or average) of the domain pixel values. However this process is very time consuming

Global Journal of Computer Science and Technology (GJCST)

LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks

Author: Chaudhuri Bidyut Baran
Dubey Shiv Ram
Manna Suvojit
Roy Swalpa Kumar
Publication venue
Publication date: 06/08/2020
Field of study

The activation function in neural network is one of the important aspects which facilitates the deep training by introducing the non-linearity into the learning process. However, because of zero-hard rectification, some of the existing activation functions such as ReLU and Swish miss to utilize the large negative input values and may suffer from the dying gradient problem. Thus, it is important to look for a better activation function which is free from such problems. As a remedy, this paper proposes a new non-parametric function, called Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs). The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem. The training and classification experiments are performed over benchmark Iris, MNIST, CIFAR10, CIFAR100 and twitter140 datasets to show that the proposed activation achieves faster convergence and higher performance. A very promising performance improvement is observed on three different type of neural networks including Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent neural network like Long-short term memory (LSTM). The advantages of proposed activation function are also visualized in terms of the feature activation maps, weight distribution and loss landscape. The code is available at https://github.com/swalpa/lisht.Comment: Submitted to IET Image Processin

arXiv.org e-Print Archive

Deep Hyperspectral Unmixing using Transformer Network

Author: Ghosh Preetam
Koirala Bikram
Rasti Behnood
Roy Swalpa Kumar
Scheunders Paul
Publication venue
Publication date: 31/03/2022
Field of study

Currently, this paper is under review in IEEE. Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way in the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task of hyperspectral unmixing and propose a novel deep unmixing model with transformers. We aim to utilize the ability of transformers to better capture the global feature dependencies in order to enhance the quality of the endmember spectra and the abundance maps. The proposed model is a combination of a convolutional autoencoder and a transformer. The hyperspectral data is encoded by the convolutional encoder. The transformer captures long-range dependencies between the representations derived from the encoder. The data are reconstructed using a convolutional decoder. We applied the proposed unmixing model to three widely used unmixing datasets, i.e., Samson, Apex, and Washington DC mall and compared it with the state-of-the-art in terms of root mean squared error and spectral angle distance. The source code for the proposed model will be made publicly available at \url{https://github.com/preetam22n/DeepTrans-HSU}.Comment: Currently, this paper is under review in IEE

arXiv.org e-Print Archive

Multimodal Fusion Transformer for Remote Sensing Image Classification

Author: Chanussot Jocelyn
Deria Ankur
Hong Danfeng
Plaza Antonio
Rasti Behnood
Roy Swalpa Kumar
Publication venue
Publication date: 31/03/2022
Field of study

Vision transformer (ViT) has been trending in image classification tasks due to its promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViT models in hyperspectral image (HSI) classification tasks, but without achieving satisfactory performance. To this paper, we introduce a new multimodal fusion transformer (MFT) network for HSI land-cover classification, which utilizes other sources of multimodal data in addition to HSI. Instead of using conventional feature fusion techniques, other multimodal data are used as an external classification (CLS) token in the transformer encoder, which helps achieving better generalization. ViT and other similar transformer models use a randomly initialized external classification token {and fail to generalize well}. However, the use of a feature embedding derived from other sources of multimodal data, such as light detection and ranging (LiDAR), offers the potential to improve those models by means of a CLS. The concept of tokenization is used in our work to generate CLS and HSI patch tokens, helping to learn key features in a reduced feature space. We also introduce a new attention mechanism for improving the exchange of information between HSI tokens and the CLS (e.g., LiDAR) token. Extensive experiments are carried out on widely used and benchmark datasets i.e., the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. In the results section, we compare the proposed MFT model with other state-of-the-art transformer models, classical CNN models, as well as conventional classifiers. The superior performance achieved by the proposed model is due to the use of multimodal information as external classification tokens

arXiv.org e-Print Archive

WetMapFormer: A unified deep CNN and vision transformer for complex wetland mapping

Author: Ali Jamali
Pedram Ghamisi
Swalpa Kumar Roy
Publication venue: 'Elsevier BV'
Publication date: 01/06/2023
Field of study

The Ramsar Convention of 1971 encourages wetland preservation, but it is unclear how climate change will affect wetland extent and related biodiversity. Due to the use of the self-attention mechanism, vision transformers (ViTs) gain better modeling of global contextual information and become a powerful alternative to Convolutional Neural Networks (CNNs). However, ViTs require enormous training datasets to activate their image classification power, and gathering training samples for remote sensing applications is typically costly. As such, in this study, we develop a deep learning algorithm called (WetMapFormer), which effectively utilizes both CNNs and vision transformer architectures for precise mapping of wetlands in three pilot sites around the Albert county, York county, and Grand Bay-Westfield located in New Brunswick, Canada. The WetMapFormer utilizes local window attention (LWA) rather than the conventional self-attention mechanism for improving the capability of feature generalization in a local area by substantially reducing the computational cost of vanilla ViTs. We extensively evaluated the robustness of the proposed WetMapFormer with Sentinel-1 and Sentinel-2 satellite data and compared it with the various CNNs and vision transformer models which include ViT, Swin Transformer, HybridSN, CoAtNet, a multimodel network, and ResNet, respectively. The proposed WetMapFormer achieves F-1 scores of 0.94, 0.94, 0.96, 0.97, 0.97, 0.97, and 1 for the recognition of aquatic bed, freshwater marsh, shrub wetland, bog, fen, forested wetland, and water, respectively. As compared to other vision transformers, the WetMapFormer limits receptive fields while adjusting translational invariance and equivariance characteristics. The codes will be made available publicly at https://github.com/aj1365/WetMapFormer

Directory of Open Access Journals